Tutorial step 9: Aftermath |
Our filter works and is now fully functional. So what else can we do to it?
Pixels-in-place optimization |
As it stands, our current implementation always uses a separate output buffer. When 2x enlargement is off, this extra output buffer is unnecessary. We could gain some speed by altering the paramProc and runProc functions so that the output buffer overlapped the input buffer. paramProc simply sets fa->dst.offset to fa->src.offset and returns 0, and runProc merges the src and dst pointers together. This makes any assembly implementations considerably easier, because only one pointer is used, and can increase cache hit rates by reducing the filter's working set size.
This can get you in trouble for MMX implementations, because although the src pointer is always aligned by 4, it is not always aligned by 8. In this case, modifying the dst bitmap or requesting a separate output buffer with an aligned offset and pitch may help. The beginning of a buffer is always aligned to an 8-byte boundary.
back to main page
tutorial[8]: adding job (batch) support
VirtualDub external filter SDK 1.05 | ©1999-2001 Avery Lee <phaeron@virtualdub.org> |